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Abstract 

Gibbs random fields play an important role in statistics, however, the resulting 
likelihood is typically unavailable due to an intractable normalizing constant. Com¬ 
posite likelihoods offer a principled means to construct useful approximations. This 
paper provides a mean to calibrate the posterior distribution resulting from using a 
composite likelihood and illustrate its performance in several examples. 


1 Introduction 

Gibbs random fields play an important and varied role in statistics. The autologistic model 
is used to model the spatial distribution of binary random variables defined on a lattice or 
grid (Besag, 1974). The exponential random graph model or p* model is arguably the most 
popular statistical model in social network analysis (Robins et ah, 2007). Other application 
areas include biology, ecology and physics. 

Despite their popularity, Gibbs random fields present considerable difficulties from the 
point of view of parameter estimation, because the likelihood function is typically intractable 
for all but trivially small graphs. One of the earliest approaches to overcome this difficulty is 
the pseudolikclihood method (Besag, 1975), which replaces the joint likelihood function by 
the product of full-conditional distributions of all nodes. It is natural to consider generaliza¬ 
tions which refine pseudolikelihood by considering products of larger collections of variables. 
The purpose of this paper is to consider such composite likelihood methods. In particular, we 
are interested in their use for Bayesian inference. Friel (2012) focused on a similar problem 
and studied how the size of the collections of variables influence the resulting approximate 
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posterior distribution. Our main contribution is to present an approach to calibrate the 
posterior distribution resulting from using a mis-specihed likelihood function. 

This paper is organised as follows. Section 2 outlines a description of Gibbs random 
fields, and in particular the autologistic distribution. Composite likelihoods are introduced in 
Section 3. Here we focus especially on how to formulate conditional composite likelihoods for 
application to the autologistic model. We also focus on the issue of calibrating the composite 
likelihood function for use in a Bayesian context. Section 5 illustrates the performance of the 
various estimators for simulated data. The paper concludes with some remarks in Section 6. 

2 Discrete-valued Markov random fields 

A Markov random held y is a family of random variables yi indexed by a finite set 5? = 
{1,..., n} of nodes of a graph and taking values from a finite state space W. Here the depen¬ 
dence structure is given by an undirected graph which defines an adjacency relationship 
between the nodes of S?\ by definition i and j are adjacent if and only if they are directly 
connected by an edge in the graph Sf. The likelihood of y given a vector of parameters 
9 — (#i,..., 9d) is defined as 


f(y | 6) cx exp(d T s(y)) := q(y\9), (1) 

where s(y) = (si(y),..., Sd(y)) is a vector of sufficient statistics. However a major issue 
arises due to the fact that the normalizing constant in (1), 

z(ff) = exp (« T s(y)), 

yG'S'’ 

depends on the parameters 9 , and is a summation over all possible realisation of the Gibbs 
random held. Clearly, z{9 ) is intractable for all but trivially small situations. This poses 
serious difficulties in terms of estimating the parameter vector 6. 

One of the earliest approaches to overcome the intractability of (1) is the pseudolikelihood 
method (Besag, 1975) which approximates the joint distribution of y as the product of full- 
conditional distributions for each yi , 


n 

/pseudo ( V ) = f(jji\y—ii ^)) 

i=l 

where y_, ; denotes y\{y*}. This approximation has been shown to lead to unreliable estimates 
of 9 , see for example, Ryden and Titterington (1998), Friel et al. (2009). This is in fact one 
of the earliest composite likelihood approximations, and we will outline work in this area 
further in Section 3. 

The autologistic model, hrst proposed by Besag (1972), is defined on a regular lattice of 
size m x m', where n = mm'. It is used to model the spatial distribution of binary variables, 
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taking values —1,1. The autologistic model is defined in terms of two sufficient statistics, 

n n 

so{y) = ^2yi, s 1 {y) = '^2'22y i y j , 

2=1 j=l .<£ . 

where the notation i ~ j means that lattice point i is connected to lattice point j in . 
Following this notation, the normalizing constant of an autologistic model should be written 
z(9,&), highlighting that it also depends on a graph of dependency. Henceforth we assume 
that the lattice points have been indexed from top to bottom in each column and where 
columns are ordered from left to right. For example, for a first order neighbourhood model 
an interior point y l has neighbours {yi- m , Vi-i, Vi+i, yi+m}- Along the edges of the lattice 
each point has either 2 or 3 neighbours. The full-conditional of y t can be written as 

f{Vi\y-i, &) °c exp(6 » 0 yi + 0 X yi{yi- m + Vi-x + Vi +1 + yi+m)), (2) 

where denotes y excluding yi. As before, the conditional distribution is modified along 
the edges of the lattice. The Hammersley-Clifford theorem (Besag, 1974) shows the equiv¬ 
alence between the model defined in (2) and in (1). The parameter 9q controls the relative 
abundance of —1 and +1 values and the parameter 9\ controls the level of spatial aggregation. 
Note that the Ising model is a special case, resulting from 9 0 = 0. 

The auto-models of Besag (1974) allow variations on the level of dependencies between 
edges and a potential anisotropy can be introduced on the graph. Indeed, consider a set 
of graphs {£fi,..., %}. Each graph of dependency % induces a summary statistic Sk(y) = 
Yl.Vk .ViVi- For example, one can consider an anisotropic configuration of a first order 

3 i j 

neighbourhood model: that is edges of Sfi are all the vertical edges of the lattice and edges of 
are all the horizontal ones. Then an interior point y, : has neighbours {y*_i, yi+i \ according 
to and {yi-m, yi+m} according to Along the edges of the lattice each point has either 
1 or 2 neighbours. This allows to set an interaction strength that differs according to the 
direction. 


3 Composite likelihoods 

There has been considerable interests in composite likelihoods in the statistics literature. 
See, Varin et al. (2011) for a recent overview. Our primary objective is to work with a 
realisation from an autologistic distribution y. According to the previous section we denote 
5? = {1,..., mm!} as an index set for the lattice points. Following Asuncion et al. (2010) 
we consider a general form of composite likelihood written as 

c 

/cl (y | 9) = IJ fiVAi I VBi,0)- 
2=1 


Some special cases arise: 
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1. Ai — A, Bi — 0, C — 1 corresponds to the full likelihood. 

2. Bi = 0 is often termed marginal composite likelihood. 

3. Bi = A \ Ai is often termed conditional composite likelihood. 

The focus of this paper is on conditional composite likelihoods, since the autologistic dis¬ 
tribution is defined in terms of conditional distributions. Note that the pseudolikelihood 
is a special case of 3. where each Ai is a singleton. We restrict each Ai to be of the same 
dimension and in particular to correspond to contiguous square ’blocks’ of lattice points of 
size k x k. In terms of the value of C in case 3., an exhaustive set of blocks would result 
in C — (m — k + 1) x (n — k + 1). In particular, we allow the collection of blocks {Ai} to 
overlap with one another. 


3.1 Bayesian inference using composite likelihoods 

The focus of interest in Bayesian inference is the posterior distribution 

p(6\y)ccf(y \ 9)p(9). (3) 

Our proposal here is to replace the true likelihood f (y \ 9) with a conditional composite 
likelihood, leading us to focus on the approximated posterior distribution 

Pcl{9 | y) oc / CL (y | 6)p(6). 

Surprisingly, there is very little literature on the use of composite likelihoods in the Bayesian 
setting, although Pauli et al. (2011) present a discussion on the use of conditional composite 
likelihoods. Indeed this paper suggests, following Lindsay (1988), that a composite likelihood 
should take the general form 

c 

fcL(y\e) = Y[f(y Al | SB,, 9)", (4) 

i=l 

where Wi are positive weights. In related work, Friel (2012) examined composite likelihood 
for various block sizes when w t — 1. Our paper deals with the issue of calibrating the 
weights. Before focusing on the tuning of Wi, we highlight here the empirical observation 
that non-calibrated composite likelihood leads to an approximated posterior distribution 
with substantially lower variability than the true posterior distribution, leading to overly 
precise precision about posterior parameters, see Figure 1. 


3.2 Computing full-conditional distributions of Ai 

The conditional composite likelihood which we described above relies on evaluating 

t , , m ex P (9 0 s 0 (yAi) + Si(yAi \ V-aJ) 
fiVAily-AijO) - - , „ - r- , 


(5) 
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where 


so (VAi) = ^2vj, SxiyAily-Ai) = EE yeVj- 

j&Ai j&Ai 

Also the normalizing constant now includes the argument yA t emphasising that it involves a 
summation over all possible realisations of sub-lattices defined on the set Ai and conditioned 
on the realised ij-A, , that is conditioned by all the lattice point of y~A t connected to a 
lattice point of ija, by an edge of First we describe an approach to compute the overall 
normalizing constant for a lattice, without any conditioning on a boundary. 

Generalised recursions for computing the normalizing constant of general factorisable 
models such as the autologistic models have been proposed by Reeves and Pettitt (2004). 
This method applies to autologistic lattices with a small number of rows, up to about 20, 
and is based on an algebraic simplification due to the reduction in dependence arising from 
the Markov property, ft applies to un-normalized likelihoods that can be expressed as a 
product of factors, each of which is dependent on only a subset of the lattice sites. We can 
write q(y \ 6) in factorisable form as 

n 

q(y I d ) = n®(tt I 6 )> 

2=1 

where each factor g* depends on a subset y , = yi , yi+i ,..., yi+ m of y, where m is defined to 
be the lag of the model. We may define each factor as 

qi(Vu °) = exp {6 0 yi + 9iyi(y i+ i + yi+m)} (6) 

for all i, except when i corresponds to a lattice point on the last row or last column, in which 
case y l+ i or ?/ i+rn , respectively, drops out of (6). 

As a result of this factorisation, the summation for the normalizing constant, 

n 

z(0,&) = ^2Ylqi(yi I d ) 

y *=i 

can be represented as 

= ^2qn{y n I o)---J2qi(vi I °) ( 7 ) 

Vn 3/1 

which can be computed much more efficiently than the straightforward summation over the 
2 n possible lattice realisations. Full details of a recursive algorithm to compute the above 
can be found in Reeves and Pettitt (2004). Note that this algorithm was extended in Friel 
and Rue (2007) to also allow exact draws from f(y\9) 

The minimum lag representation for an autologistic lattice with a first order neighbour¬ 
hood occurs for r given by the smaller of the number of rows or columns in the lattice. Iden¬ 
tifying the number of rows with the smaller dimension of the lattice, the computation time 
increases by a factor of two for each additional row, but linearly for additional columns, ft is 
straightforward to extend this algorithm to allow one to compute the normalizing constant 
in (5), so that the summation is over the variables yA, : and each factor involves conditioning 
on the set V-a,- 
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4 Bayesian composite likelihood adjustments 

Approximating the true posterior distribution by remplacing the true likelihood by the com¬ 
posite likelihood leads to misspecihcation in the mean and variance of approximate posterior 
distribution as shown in Figure 1. The aim of the following Section is to establish identities 
that links the gradient and the Hessian of the log-posterior for 9 to the moments of sufficient 
statistics with respect to the distribution of the Gibbs random held, whereupon we use these 
identities to calibrate the weights w L in (4). 

4.1 An estimation of the gradient and curvature of the posterior 
distribution 

Using (3) as a starting point, we can write the gradient of the log-posterior for 9 as 

Vlogp(d | y) = s(y ) — Vz{9^) + Vlog p(9). 

It is straightforward to show that 


Vz(6,&)=-E y \ 0 s(y) } 

hence the gradient of the log-posterior for 9 can be written as a sum of moments of s(y), 
namely 

V logp (6\y) = s(y ) - E„| e s(j/) + V logp(0). (8) 

Taking the partial derivatives of the previous expression yields similar identity for the Hessian 
matrix of the log-posterior for 9 , 


Hlog p{9 | y) = —K y \o(s(y)) + Hlogp(0), (9) 

where ~K y \o(s(y)) denotes the covariance matrix of s(y) when y has distribution f (y \ 9). 
Similar to (8) and (9), one can express the gradient and Hessian of the log-posterior logpciX# | 
y) in terms of moments of the sufficient statistics. 


4.2 Mean adjustment 

The mean adjustment aims to ensure that the posterior and the approximated posterior 
distributions have the same maximum. Thus, the adjustment here is simply the substitution 

Pcl(0 | y ) = Pcl(0 -9* + 9 * cL | y), 

where 9* and 9^ L , is the maximum a posteriori (MAP) of the posterior distribution p{9 \ y) 
and the approximated posterior distribution Pcl($ I p), respectively. 

Addressing the issue of estimation of 9* and 9q L , we note generally from equation (9) 
that log p{9 | y ) and logpciX^ | y ) are not concave functions. However the Hessian of the 
log-likelihood is a semi-negative matrix and so is unimodal. A reasonable choice of prior, 
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for example with a semi-negative Hessian matrix, will thus lead to a unimodal posterior 
distribution. Care must be taken to ensure convergence of the optimisation algorithms to 
9* and 9q l . In particular, we remark that since the approximate posterior distribution 
is typically very sharp around the MAP, as shown in Figure 1, it can be difficult to ensure 
convergence of gradient based algorithms in reasonable computational time. However, in our 
experiments we have found that using a BFGS algorithm which is based on a Hessian matrix 
approximation using rank-one updates calculated from approximate gradient evaluations, 
provided good performance in our context. Note that in practice, the gradient evaluated in 
the algorithm is stochastic and based on a standard Monte Carlo estimator of the expectation 
E j/| eSj(y). 


Algorithm 1: MAP estimation 
Input: A lattice y 

Output: Estimators 6* of 9* and 9* Ch of 9* Gli 

estimate 9* Ch using a BFGS algorithm based on Monte Carlo estimator of 
Vlog pcl{9 | y); 

estimate 9* using a BFGS algorithm based on Monte Carlo estimator of 
V log pc\j(9 | y) and starting from 9q L ] 
return 9* and 9 * cL ; 


Estimating 9* using a random initialization point in BFGS algorithm (see Algorithm 1) 
is inefficient. Indeed, estimating E y | 0 s(y) is the most cumbersome part of the algorithm and 
should be done as little as possible. Despite that 9* Ch is not equal to 9* it is usually close 
and turns out to yield a good initialization to the second BFGS algorithm. 

4.3 Magnitude adjustment 

The general approach we propose to adjust the covariance of the approximated posterior is 
to temper the conditional composite likelihood with some weights W{ in order to modify its 
curvature around the mode. We remark that the curvature of a scalar field at its maximum 
is directly linked to the Hessian matrix. Based on that observation, our proposal is to choose 
Wi such that 

H logy( 6 ** | y) = U logp CL (9* CL \ y ). 

Note in our context, there exists no particular reason to weight each blocks differently. 
Consequently we assume that each block has the same weight and we denote it w. 

For the sake of simplicity, assume a uniform prior but everything can be easily written 
for any prior. When 9 is a scalar parameter, writing identity (9) for p (9 \ y) and pch{9 \ y) 
yields 

w= —-. (10) 

£i=i Var ^|e £L (s(y Ai | y_ A J) 
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However this approach does not apply when dealing with autologistic models since 9 e 
is a vector. We thus have a scalar constraint for an equality between the two matrices 

c 

K . v r(X2 /)) = w E k #c L ( s (^. I y-Ai))- 

1=1 

In Table 1 we consider some possible identities that are natural to consider in order to choose 
a reasonable value for w. The options and w ^ include only the information contained 
in the diagonal of each matrix whereas options u/ 1 ', and take advantage of all the 
information of the covariance matrix. 

Table 1: Weight options for a magnitude adjustment when 6 G M. d 


w 


(i). 


det [K y | e »(s(y))] 


det 


I V-Ai)} 


1/d 


w ( 2 k -tr 


C 


K y\e*(s(y)) X] K J/I 0 £l ( S ( y Ai I y-Ai)) 


v.z=l 


w 


(3). 1 


w 


E 


Var y | e *(si(y)) 


(4). 


d E l =iVar 2/ | 0 * L (s : ,( 2 / Ai I y_ Ai )) 
tr [K yle ,(s(y))] 


tr 


I V—Ai )) 


w 


(5). 


1 


tr 


(*(!/)) 


tr I (n?= 1 K y\eu(s{yAi I y-Ai 


4.4 Curvature adjustment 

The adjustment presented in the previous Section only modify the magnitude of the approx¬ 
imated posterior but do not affect its geometry. The weight w similarly affects each direction 
of space parameters and does not take into account a possible modification of the correlation 
between the variables induced by the use of a composite likelihood approximation. We ex¬ 
pect this phenomenon to be particularly important when dealing with models where there is 






















a potential on singletons such as the autologistic model. Indeed estimation of the abundance 
parameter and interaction parameter, 9 o and 9 1 , respectively, do not suffer from the same 
level of approximation relating to the independence assumption between blocks. Thus we 
should move from the general form (4) with a scalar weight on blocks to one involving a 
matrix of weights. 

Following Ribatet et al. (2012) in the context of marginal composite likelihood, our 
strategy is to write 

f(y\0)^fcL(y\9* CL + W(9-9* CL )), 

for some constant d x d matrix W. Note the substitution keeps the same maximum but 
deforms the geometry of the parameter space through the matrix W. 

Assume that IT is a lower triangular matrix in order to take into account the correlation 
between the parameter components. The suggestion of Ribatet et al. (2012) is to choose W 
in order to satisfy asymptotic properties of maximum composite likelihood estimators when 
the sample size tends to infinity. Since we only have one observation, we do not focus on the 
asymptotic covariance matrix results but rather on the covariance matrix at the estimated 
MAP. Indeed, we follow the same approach introduced in Section 4.3, 

H logp(0 | y)\ e=$ . = H logj) OL (0J L + W{8 - 0J L ) | y)\ 0=e ,^ , 
which is equivalent to 

Hlogp(r | y) = IU T H logp CL (9q L \ y)W. 

This leads to a system of equations that can be easily solved. The problem of uniqueness 
faced by Ribatet et al. (2012) due to a Cholesky decomposition does not raise any issues. 
Since we have access to a close form of different Hessians through Monte Carlo estimators, 
solutions are easy to compute and perform the same way. 

5 Examples 

In this numerical part of the paper, we focus on models defined on a 16 x 16 lattice and we 
use exhaustively all 4x4 blocks. For the lattice of this dimension the recursions proposed 
by Friel and Rue (2007) can be used to compute exactly the normalizing constants z[9^\ 
z^dj&jyAi) and to draw exactly from the distribution / (y \ 9) or from the full-conditional 
distributions of A* fid/Ai \ U-Ai , #)• This exact computation of the posterior serves as a 
ground truth against which to compare with the posterior estimates of 9 using the various 
composite likelihood estimators. Computation was carried out on a desktop PC with six 
3.47Ghz processors and with 8Gb of memory. Computing the normalizing constant of each 
block took 0.0004 second of CPU time. One iteration of the BFGS algortihm took 0.09 
seconds to estimate the MAP of the composite likelihood and 1 second to estimate the MAP 
of true likelihood. The weight calibration for one dataset took approximately three minutes. 
Note that for more realistic situations involving larger lattices, one requires a sampler to 
draw from the full likelihood such as the Swendsen-Wang algorithm (Swendsen and Wang, 
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1987), however the computational cost of using this algorithm increases dramatically with 
the size of the lattice. One possible alternative is the slice sampler of Mira et al. (2001) that 
provides exact simulations of Ising models. 

In each experiment, we simulated 100 realisations from the model. For each realisation, 
we use the BFGS algorithm 1 with an adhoc stopping condition to get the estimators 6* and 
^cl- One iteration of the algorithm is based on a Monte Carlo estimator of either E y \os(y) or 
Ejm | e s {VAi | y-Ai,0) calculated from 100 exact draws whereas the Monte Carlo estimators of 
the covariance matrix K ,g-, {s(y)) and K^* ( s(yAi \ U-Ai )) are based on 50000 exact draws. 
In all experiments we placed uniform priors on 9. 

Comparing the posterior p(9 \y) with the various posterior approximations pch{9 \ V) 
requires knowledge of the covariance matrix of 6. We could have used numerical integra¬ 
tion but we prefered to use a simple MCMC algorithm. In terms of implementation, 7000 
iterations were used with a burn in period of 2000 iterations for each dataset. 

First experiment We considered the special case of a first-order Ising model with a 
single interaction parameter 9 = 0.4, which is close to the critical phase transition beyond 
which all realised lattices takes either value +1 or -1. This parameter setting is the most chal¬ 
lenging for the Ising model, since realised lattices exhibit strong spatial correlation around 
this parameter value. Using a fine grid of {9k} values, the right hand side of: 

P{6k | y) oc Q ^ y ( [ d ^ p{9 k ), k = 1,..., n, 

can be evaluated exactly. Summing up the right hand side - using the trapezoidal rule - 
yields an estimate of the evidence, p(y), which is the normalizing constant for the expression 
above and which in turn can be used to give a very precise estimate of p{9 \ y ). The plot 
so obtained for the posterior and posteriror approximations are given by Figure 1(a). On 
this example it should be clear that using an un-calibrated conditional composite likelihood 
leads to considerably underestimated posterior variances. But once we perform the mean 
adjusment and the magnitude adjusment, this provides a very good approximation of the 
true posterior. In Figure 1(b) we display the ratio Kcl(#)/K($), where K(6 I ), respectively 
Kcl (9), denotes the variance of the posterior, respectively the posterior approximation, for 
9, based on 100 realisations of a first-order Ising model. In view of these results there is no 
question that the magnitude adjustment (10) provides an efficient correction of the variance. 

Table 2 confirms this result through evaluation of the relative mean square error, that is 
E [(1 — Kcl(^)/K(6 1 )) 2 ], and the average KL-divergence between the approximated posterior 
and true posterior distributions based on 100 realisations of a first order. 

Second experiment We were interested in an anisotropic configuration of a first-order 
Ising model. We set 6 = (0.3, 0.5). The evidence p(y) is here estimated with an importance 
sampling method. We drew 1000 points using a Gaussian law whose moments are related 
to the Monte Carlo estimators of moments of 6. Figure 2(a) and Figure 2(b) represent a 
comparison between the true likelihood and the estimates. As for the isotropic case, the 
mean and the magnitude adjustment allows us to build an accurate approximation of the 
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Ppseudo(0/y) PcL(0/y) (W= 1) PcL(0/y) (w calibrated) 

(b) 

Figure 1: First experiment results, (a) Posterior distribution and posterior distribution 
approximations for 9 of a first-order Ising model, (b) Boxplot displaying the ratio of the 
variances Kcl($)/K( 0) for 100 realisations of a first-order Ising model. 

Table 2: Evaluation of the relative mean square error (RMSE) and the average KL-divergence 
(AKLD) between the approximated posterior and true posteriror distributions based on 100 
simulations of a first-order Ising model. 

COMP. LIKELIHOOD RMSE AKLD 


Ppseudo {0 | y) 1-96 0.510 

Pcl(# | y) (w = 1) 0.757 0.337 

Pcl (6 | y) (w defined by (10)) 0.040 0.010 



posterior. In Figure 2(c) we display boxplots, based on 100 realisations of an anisotropic 
first-order Ising model, of the ratio ||Kcl(#)K _1 (#)||f/\/2, where || • ||p denotes the Frobenius 
norm. The different weight options are almost equivalent in term of variance correction. The 
weight W 5 seems to be the most informative. It should not be a surprise since it is based on 
the Frobenius norm which carries information of the matrix and its singular values. 

This conclusion is emphasized in Table 3 which presents the relative mean square error 
E [||1 — Kcl($)K - 1 (6 , )|H] and the average KL-divergence between the approximate and true 
posterior distributions for 100 realisations of the model. 

Third experiment Here we focused on an autologistic model with a first-order de- 
pendance structure. The abundance parameter was set to 0 O = 0.05 and the interaction 
parameter to 9\ = 0.4. The differents implementation settings are exactly the same as for 
the second experiment. This example illustrates how the use of composite likelihood approx¬ 
imation can induce a modification of the geometry of the distribution as shown in Figure 
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Figure 2: Second experiment results. (a) Posterior distribution and posterior distribu¬ 
tion approximation based on the conditional composite likelihood with w — 1. (b) Poste¬ 
rior distribution and posterior distribution approximation based on the conditional compos¬ 
ite likelihood with mean and magnitude adjustments {w = w^). (c) Boxplot displaying 
||K cl (6 i )K~ 1 (6 ) )||f/v / 2 for 100 realisations of an anisotropic first-order Ising model. 
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Table 3: Evaluation of the relative mean square error (RMSE) the average KL-divergence 
(AKLD) between the approximated posterior and true posteriror distributions based on 100 
simulations of an anisotropic first-order Ising model. 

COMP. LIKELIHOOD RMSE AKLD 


Pcl(0 

\y) 

(w 

= 1) 

1.28 

2.25 

Pcl(0 

1 y) 

(w 

= W^) 

0.555 

0.067 

Pcl(0 

1 y) 

(w 

= w^>) 

0.583 

0.066 

Pcl{0 

1 y) 

( w 

= w^>) 

0.540 

0.071 

Pcl(0 

1 y) 

(w 

= u/ 4 >) 

0.551 

0.061 

Pcl(0 

1 y) 

(w 

= w^) 

0.525 

0.079 


3(a). Indeed in addition to the mean and variance misspecification the conditional compos¬ 
ite likelihood also changes the correlation between the variables. It should be evident that 
a magnitude adjustent would not be fruitful here since it would not affect the correlation. 
Instead the curvature adjustment manages to do so and thus yields a good approximation 
of the posterior, see Figure 3(b). One can object that we do not detect tail of the posterior. 
But Figure 3(c) and Table 4 show that the adjustment yields an efficient correction of the 
variance. 

Table 4: Evaluation of the relative mean square error (RMSE) and the average KL-divergence 
(AKLD) between the approximated posterior and true posteriror distributions based on 100 
simulations of a first-order autologistic model. 

COMP. LIKELIHOOD RMSE AKLD 


Pcl(0 | y)(w = 1) 3.44 2.38 

Pcl(O* C l + W(0-9* cl ) | y) 0.96 1.89 


6 Conclusion 

This paper has illustrated the important role that conditional composite likelihood approx¬ 
imations can play in the statistical analysis of Gibbs random fields, and in particular in the 
Ising and autologistic models in spatial statistics, as a means to overcoming the intractability 
of the likelihood function. However using composite likelihoods in a Bayesian setting can be 
problematic, since the resulting approximate posterior distribution is typically too concen¬ 
trated and therefore underestimates the posterior mean and variance. Our main contribution 
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(b) 




Figure 3: Third experiment results, (a) Posterior distribution and posterior distribution 
approximation based on the conditional composite likelihood with w — 1. (b) Posterior dis¬ 
tribution and posterior distribution approximation based on the conditional composite like¬ 
lihood with mean and curvature adjustments, (c) Boxplot displaying ||K cl (6 , )K _1 (6 ) )|| f /a/2 
for 100 realisations of a first-order autologistic model. 


has been to show how to calibrate the approximate posterior distribution that results from 
replacing the true likelihood with a conditional composite likelihood. Further work will focus 
on how to extend this framework to Gibbs random fields with larger number of parameters, 
such as the exponential random graph model. 
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